Basics of R

A Statistical Computing Environment

Julianne Clina

University of Kansas Medical Center

June 25, 2025

What are R and RStudio?

What is R?
R is a programming language for statistical computing and graphics.

What is RStudio?
RStudio is an Integrated Development Environment (IDE) — a software that provides a user-friendly interface to write, organize, and run R code more easily.

Analogy
R is like a car engine and RStudio is like a car dashboard. :::

Components of RStudio

Console
The place where you can type and run R commands directly. It shows immediate results and error messages.

Source Editor
Where you write, edit, and save your R scripts or programs. You can run parts or all of your code from here.

Environment/History
Shows the data objects (like variables, data frames) currently in memory and keeps a history of commands you’ve run.

Plots
Displays graphs and charts created by your R code for visual data analysis.

Files/Packages/Help/Viewer
- Files: Manage your project files and folders.
- Packages: Install and load add-ons that extend R’s functionality.
- Help: Access documentation and help files.
- Viewer: Display web content or interactive visualizations within RStudio.

Terminal
A command-line interface where you can run system commands or use other programming languages alongside R.

Starting a Project in RStudio

Starting a Project
- Keeps files and work organized by project.
- Prevents mixing data or scripts from different analyses.
- Makes sharing and reproducing work easier.

What is the Working Directory?
- The folder where R reads and saves files by default.
- Acts as R’s “home folder” during your session.

How the Working Directory Works with Projects
- When you start a new project, RStudio sets the working directory to that project’s folder.
- Any files you create or export are saved there by default.
- This helps keep all your project files together and easy to find.

Setting the working directory

Packages and Functions in R

What is a Package?

  • A set of functions, data, or code designed to accomplish specific tasks

What is a Function?

  • Functions take input, do something, and return output

Think of it like a toolbox. A package is like a toolbox, and a function is a specific tool in the box.

Essential Packages

Package Purpose Example Functions
dplyr Data wrangling filter, select, mutate
tidyverse/tidyr Data organization pivot_longer, drop_na
ggplot2 Data Visualization geom_point, facet_wrap
readr Data Import read_csv

Basics of Using a Function

  • In order to use a function, you have to feed it the information it needs to complete its task
  • To figure out what it needs, you can use the help function help(function_name) or its shortcut ?function_name. You can also click the “Help” tab
  • Here you will read about the package usage, descriptions arguments, details etc.

Making Your Very Own Function!

  • Writing your own functions helps increase efficiency if doing the same task over and over
  • Components of the function are the function name, input, code that explains what you want the function to do, and the output
library(dplyr)

fahrenheit_to_celsius <- function(f_temp) {
  c_temp <- (f_temp - 32) * 5/9
  return(c_temp)
}
fahrenheit_to_celsius(98.6)
[1] 37
get_bmi_category <- function(wt, ht) {
  bmi <- wt / ht^2
  category <- case_when(
    bmi < 18.5 ~ "Underweight",
    bmi >= 18.5 & bmi < 24.9 ~ "Normal weight",
    bmi >= 25 & bmi < 29.9 ~ "Overweight",
    bmi >= 30 ~ "Obesity",
  )
  return(paste0("BMI = ", round(bmi, 1), " (", category, ")"))
}
get_bmi_category(85, 1.54)
[1] "BMI = 35.8 (Obesity)"

Package Cheat Sheets

tidyverse ggplot2

dplyr

Updating R Studio and Packages

Why even bother?

  • Get new features, bug fixes, and better performance
  • Ensure compatibility with newer packages
  • Stay consistent with collaborators and tutorials

When you open R Studio it will tell you what version you are using

Update RStudio

  1. Visit https://posit.co/download/rstudio/
  2. Choose your operating system (Windows or Mac)
  3. Download and install the latest version

Update Your Packages

Updated packages makes sure you are working with the latest versions which cna include fixes of old bugs Occasionally, if in the middle of an analysis or project, updated a package might not be desirable

Tools → Check for Package Updates → Select and update

OR

OR Use: update.packages(ask=FALSE) or install.packages(“package_name”, dependencies = TRUE, update = TRUE)

Data Types in R

Fundamental Data Types

  • In R, variables can be stored as several types of data
  • Different data types can do different things

Data Types We Might Use

Data Type Example Use
numeric 120, 34.2 blood pressures, bmi
integer 15, 26, 0003 steps, number of siblings, participant ID
character (string) placebo, tall randomization arm, category
logical (boolean) TRUE, FALSE survival, presence of health condition
# Numeric data: BMI
bmi <- 24.8
class(bmi)
[1] "numeric"
# Integer data: Participant ID
id <- 0003L
class(id)
[1] "integer"
# Character data: randomization group
treatment_arm <- "placebo"
class(treatment_arm)
[1] "character"
# Logical data: presence of hypertension
has_hypertension <- FALSE
class(has_hypertension)
[1] "logical"

Why Data Types Matter in R

  • Knowing your data types helps prevent bugs and weird results
  • Functions behave differently depending on type
  • Mistakes often create errors (and that’s OK!)
# Numeric vs. character: Addition
num1 <- 5
num1 + 2 
[1] 7
num2 <- "5"
num2 + 2        
Error in num2 + 2: non-numeric argument to binary operator
# Logical used in math
x <- TRUE
y <- FALSE
sum(c(x, y, TRUE))
[1] 2
# Sorting: numeric vs. character
ages <- c(15, 9, 2)
sort(ages)  
[1]  2  9 15
as_char <- as.character(ages)
sort(as_char)   
[1] "15" "2"  "9" 
# Missing Data 
# Numeric NA
num_values <- c(100, NA, 200)
mean(num_values)          
[1] NA
mean(num_values, na.rm=TRUE)
[1] 150
# Logical NA
logical_values <- c(TRUE, NA, FALSE)
sum(logical_values) 
[1] NA
sum(logical_values, na.rm=TRUE)   
[1] 1
# Character NA
char_values <- c("yes", NA, "no")
paste("Answer:", char_values)  
[1] "Answer: yes" "Answer: NA"  "Answer: no" 

Vectors in R

  • A vector is the most basic data structure in R.
  • Vectors are everywhere in R — for example, each column in a data frame is a vector.
  • It is a sequence of values that are all the same data type (if not, R will force it to be)
v1 <- c(1:5)
is.numeric(v1)
[1] TRUE
v2 <-c("cat", "house", "bunny")
is.character(v2)
[1] TRUE
v3 <-c(v1,v2)
v3
[1] "1"     "2"     "3"     "4"     "5"     "cat"   "house" "bunny"
is.character(v3)
[1] TRUE

Factor

  • A factor is a type of vector for categorical data
  • Used when there is an order or levels to a variable (i.e “short”, “medium”, “tall”)
  • Responds differently to certain functions
bloodtype_vector <- c("A", "B", "AB", "O")
summary(bloodtype_vector)
   Length     Class      Mode 
        4 character character 
bloodtype_factor <- factor(bloodtype_vector)
summary(bloodtype_factor)
 A AB  B  O 
 1  1  1  1 
height_vector <- c("tall", "short", "medium", "extra tall", "very short")
sort(height_vector)
[1] "extra tall" "medium"     "short"      "tall"       "very short"
height_factor <-factor(height_vector,
                       ordered=TRUE,
                       levels = c("very short", "short", "medium", "tall", "extra tall"))
sort(height_factor)
[1] very short short      medium     tall       extra tall
Levels: very short < short < medium < tall < extra tall

Lists

  • A list can hold elements of different types and lengths (unlike a vector)
  • Useful for storing models, results, or datasets together.
my_list <- list(
  name = "Bill",
  age = 35,
  scores = c(92, 87, 95),
  passed = TRUE
)
my_list
$name
[1] "Bill"

$age
[1] 35

$scores
[1] 92 87 95

$passed
[1] TRUE

Data Structures

What is a Data Frame?

  • The basic R structure for storing tabular data
  • Built into base R
  • Converts strings to factors by default (unless told not to)
  • Can sometimes change column types automatically
  • Allows partial matching of column names (can cause subtle bugs)

What is a Tibble?

  • A tibble is a data frame (with a modern twist!)
  • Part of the tidyverse
  • Never converts strings to factors, never changes variable names, never creates row names
  • Prints cleaner summaries (only first 10 rows and columns shown)
  • Better handling of large or complex data
  • Does NOT allow partial matching — safer and more predictable

Partial Matching Example:

Sometimes, base R data frames will let you access columns by partially matching the name — which can lead to silent errors.

Tibbles require exact column names, preventing this problem.

library(tibble)

df1 <- data.frame(experiment_group = c("control", "treatment"))
df1$experiment  
[1] "control"   "treatment"
tb1 <- tibble(experiment_group = c("control", "treatment"))
tb1$experiment   
NULL
# Example where partial matching can cause bugs:
df <- data.frame(
  age = c(25, 30, 45),
  average_height = c(55, 62, 50)
)
df$grad_age <- df$ave + 5  
print(df)
  age average_height grad_age
1  25             55       60
2  30             62       67
3  45             50       55
tb <- tibble(
  age = c(25, 30, 45),
  average_height = c(55, 62, 50)
)
tb$grad_age <- tb$ave + 5  
Error in `$<-`:
! Assigned data `tb$ave + 5` must be compatible with existing data.
✖ Existing data has 3 rows.
✖ Assigned data has 0 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 0 to size 3.
# Create and print normal data frame and tibble for comparison
df3 <- data.frame(id = 1:24, name = rep(c("Ann", "Ben", "Cam"), 8))
class(df3)
[1] "data.frame"
df3
   id name
1   1  Ann
2   2  Ben
3   3  Cam
4   4  Ann
5   5  Ben
6   6  Cam
7   7  Ann
8   8  Ben
9   9  Cam
10 10  Ann
11 11  Ben
12 12  Cam
13 13  Ann
14 14  Ben
15 15  Cam
16 16  Ann
17 17  Ben
18 18  Cam
19 19  Ann
20 20  Ben
21 21  Cam
22 22  Ann
23 23  Ben
24 24  Cam
tb3 <- tibble(id = 1:24, name = rep(c("Ann", "Ben", "Cam"), 8))
class(tb3)
[1] "tbl_df"     "tbl"        "data.frame"
tb3
# A tibble: 24 × 2
      id name 
   <int> <chr>
 1     1 Ann  
 2     2 Ben  
 3     3 Cam  
 4     4 Ann  
 5     5 Ben  
 6     6 Cam  
 7     7 Ann  
 8     8 Ben  
 9     9 Cam  
10    10 Ann  
# ℹ 14 more rows

GitHub and CRAN

Source Description When to Use Install With
CRAN Official R package repository Most stable and tested version install.packages("pkgname")
GitHub Developer’s source code (often in progress) Get latest features or unreleased updates devtools::install_github("user/pkgname")

Think of CRAN like the App Store — safe, reviewed, and stable.
Think of GitHub like the developer’s lab — early access, but maybe still being tested.

Install from CRAN install.packages(“ggplot2”)

Install from GitHub install.packages(“devtools”) devtools::install_github(“bhelsel/RLAB”)

GitHub packages may require additional setup or dependencies.